Chinese Document Retrieval at Trec-6 1 Multilingual Document Retrieval in Trec

نویسنده

  • Ross Wilkinson
چکیده

The TREC-6 conference was the fourth year in which document retrieval in a language other than English was carried out. In TREC-3, 4 groups participated in an ad hoc retrieval task on a collection of 208 Mbytes of Mexican newspaper text in the Spanish language. In TREC-4 there were 10 groups who participated, once again in an ad hoc document retrieval task on the same Mexican newspaper texts but with new topics. In TREC-5 there was a change of document corpus and new topics for the Spanish ad hoc retrieval task and a corpus of documents and topics to support ad hoc retrieval in the Chinese language was introduced for the rst time. In TREC-6 there was two tracks in which languages other than English were explored. In the Chinese track, a second set of topics were evaluated against the existing corpus. In the cross-lingual track experiments were conducted where queries in one language were used against a document corpus in another language. This report concentrates solely on the Chinese track. 2 Chinese Language In the Chinese language each character represents at least a complete syllable, rather than a letter as in other languages. Many characters are also single syllable words. The total number of characters is therefore quite large and somewhat ill-deened. A literate adult would typically recognise at least 5-6,000 characters. The various modern standards deene between 10-12,000 characters, although if early and ancient literature is included the number rises to approximately 100,000. Chinese is agglutinating { there is no space between consecutive characters, except perhaps, at the end of a sentence.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ETH TREC-6: Routing, Chinese, Cross-Language and Spoken Document Retrieval

ETH Zurich's participation in TREC-6 consists of experiments in the main routing task, both manual and automatic runs in the Chinese retrieval track, cross-language retrieval in each of German, French and En-glish as part of the new cross-language retrieval track, and experiments in speech recognition and retrieval under the new spoken document retrieval track. This year our routing experiments...

متن کامل

TREC-9 Cross-Language Information Retrieval (English-Chinese) Overview

(English Chinese) Overview Fredri Gey and Aitao Chen UC DATA and SIMS University of California, Berkeley e-mail: gey u data.berkeley.edu,aitao sims.berkeley.edu Abstra t Sixteen groups parti ipated in the TREC-9 ross-language information retrieval tra k whi h fo ussed on retrieving Chinese language do uments in response to 25 English queries. A variety of CLIR approa hes were tested and a ri h ...

متن کامل

The TREC-6 Spoken Document Retrieval Track

The Text REtrieval Conference (TREC) workshops provide a forum for di erent groups to compare retrieval systems on common retrieval tasks. The 1997 TREC workshop will feature a Spoken Document Retrieval task for the rst time. This paper motivates the task and describes the measures to be used to evaluate the e ectiveness of the retrieval methodologies. 1. The Text REtrieval Conference The Text ...

متن کامل

Okapi Chinese Text Retrieval Experiments at TREC-6

The focus of the Okapi TREC{6 Chinese experiments is on investigating the e ectiveness of di erent automatic indexing methods and phrase weighting for retrieval based on probabilistic models over Chinese text. We compare di erent probabilistic weighting methods based on a range of word and single character approaches. There are two indexing methods used in our experiments. One indexing method i...

متن کامل

Experiments on Proximity Based Chinese Text Retrieval in TREC 6

In TREC 6, we participate in the Chinese track and report our experiments on proximity based text retrieval. Our participation this year concentrates on automatic retrieval methods natural for the Chinese language. We index the documents by treating every Chinese character as a single term and store positional information for all terms. During retrieval we employ a proximity operator that uses ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997